347 research outputs found

    Modelling Conditions and Health Care Processes in Electronic Health Records : An Application to Severe Mental Illness with the Clinical Practice Research Datalink

    Get PDF
    BACKGROUND: The use of Electronic Health Records databases for medical research has become mainstream. In the UK, increasing use of Primary Care Databases is largely driven by almost complete computerisation and uniform standards within the National Health Service. Electronic Health Records research often begins with the development of a list of clinical codes with which to identify cases with a specific condition. We present a methodology and accompanying Stata and R commands (pcdsearch/Rpcdsearch) to help researchers in this task. We present severe mental illness as an example. METHODS: We used the Clinical Practice Research Datalink, a UK Primary Care Database in which clinical information is largely organised using Read codes, a hierarchical clinical coding system. Pcdsearch is used to identify potentially relevant clinical codes and/or product codes from word-stubs and code-stubs suggested by clinicians. The returned code-lists are reviewed and codes relevant to the condition of interest are selected. The final code-list is then used to identify patients. RESULTS: We identified 270 Read codes linked to SMI and used them to identify cases in the database. We observed that our approach identified cases that would have been missed with a simpler approach using SMI registers defined within the UK Quality and Outcomes Framework. CONCLUSION: We described a framework for researchers of Electronic Health Records databases, for identifying patients with a particular condition or matching certain clinical criteria. The method is invariant to coding system or database and can be used with SNOMED CT, ICD or other medical classification code-lists

    rEHR: An R package for manipulating and analysing Electronic Health Record data

    Get PDF
    Research with structured Electronic Health Records (EHRs) is expanding as data becomes more accessible; analytic methods advance; and the scientific validity of such studies is increasingly accepted. However, data science methodology to enable the rapid searching/extraction, cleaning and analysis of these large, often complex, datasets is less well developed. In addition, commonly used software is inadequate, resulting in bottlenecks in research workflows and in obstacles to increased transparency and reproducibility of the research. Preparing a research-ready dataset from EHRs is a complex and time consuming task requiring substantial data science skills, even for simple designs. In addition, certain aspects of the workflow are computationally intensive, for example extraction of longitudinal data and matching controls to a large cohort, which may take days or even weeks to run using standard software. The rEHR package simplifies and accelerates the process of extracting ready-for-analysis datasets from EHR databases. It has a simple import function to a database backend that greatly accelerates data access times. A set of generic query functions allow users to extract data efficiently without needing detailed knowledge of SQL queries. Longitudinal data extractions can also be made in a single command, making use of parallel processing. The package also contains functions for cutting data by time-varying covariates, matching controls to cases, unit conversion and construction of clinical code lists. There are also functions to synthesise dummy EHR. The package has been tested with one for the largest primary care EHRs, the Clinical Practice Research Datalink (CPRD), but allows for a common interface to other EHRs. This simplified and accelerated work flow for EHR data extraction results in simpler, cleaner scripts that are more easily debugged, shared and reproduced

    How to identify when a performance indicator has run its course

    Get PDF
    The official published version can be found at the link below.Increasing numbers of countries are using indicators to evaluate the quality of clinical care, with some linking payment to achievement. For performance frameworks to remain effective the indicators need to be regularly reviewed. The frameworks cannot cover all clinical areas, and achievement on chosen indicators will eventually reach a ceiling beyond which further improvement is not feasible. However, there has been little work on how to select indictors for replacement. The Department of Health decided in 2008 that it would regularly replace indicators in the national primary care pay for performance scheme, the Quality and Outcomes Framework, making a rigorous approach to removal a priority. We draw on our previous work on pay for performance and our current work advising the National Institute for Health and Clinical Excellence (NICE) on the Quality and Outcomes Framework to suggest what should be considered when planning to remove indicators from a clinical performance framework

    How do dataset characteristics affect the performance of propensity score methods and regression for controlling confounding in observational studies?:A simulation study

    Get PDF
    In observational studies, researchers must select a method to control for confounding. Options include propensity score methods and regression. It remains unclear how dataset characteristics (size, overlap in propensity scores, exposure prevalence) influence the relative performance of the methods, making it difficult to select the best method for a particular dataset. A simulation study to evaluate the role of dataset characteristics on the performance of propensity score methods, compared to logistic regression, for estimating a marginal odds ratio in the presence of confounding was conducted. Outcomes were simulated from logistic and complementary log-log models, and size, overlap in propensity scores, and prevalence of the exposure were varied. Regression showed poor coverage for small sample sizes, but with large sample sizes it was more robust to imbalance in propensity scores and low exposure prevalence than were propensity score methods. Propensity score methods frequently displayed suboptimal coverage, particularly as overlap in propensity scores decreased. These problems were exacerbated at larger sample sizes. Power of matching methods was particularly affected by lack of overlap, low prevalence of exposure, and small sample size. Performance of inverse probability of treatment weighting depended heavily on dataset characteristics, with poor coverage and bias with low overlap. The advantage of regression for large data size was less clear in sensitivity analysis with a complementary log-log outcome generation mechanism and unmeasured confounding, with superior bias and error but lower coverage than nearest neighbour and caliper matching

    Can analyses of electronic patient records be independently and externally validated? The effect of statins on the mortality of patients with ischaemic heart disease: a cohort study with nested case-control analysis

    Get PDF
    Objective To conduct a fully independent and external validation of a research study based on one electronic health record database, using a different electronic database sampling the same population. Design Using the Clinical Practice Research Datalink (CPRD), we replicated a published investigation into the effects of statins in patients with ischaemic heart disease (IHD) by a different research team using QResearch. We replicated the original methods and analysed all-cause mortality using: (1) a cohort analysis and (2) a case-control analysis nested within the full cohort. Setting Electronic health record databases containing longitudinal patient consultation data from large numbers of general practices distributed throughout the UK. Participants CPRD data for 34 925 patients with IHD from 224 general practices, compared to previously published results from QResearch for 13 029 patients from 89 general practices. The study period was from January 1996 to December 2003. Results We successfully replicated the methods of the original study very closely. In a cohort analysis, risk of death was lower by 55% for patients on statins, compared with 53% for QResearch (adjusted HR 0.45, 95% CI 0.40 to 0.50; vs 0.47, 95% CI 0.41 to 0.53). In case-control analyses, patients on statins had a 31% lower odds of death, compared with 39% for QResearch (adjusted OR 0.69, 95% CI 0.63 to 0.75; vs OR 0.61, 95% CI 0.52 to 0.72). Results were also close for individual statins. Conclusions Database differences in population characteristics and in data definitions, recording, quality and completeness had a minimal impact on key statistical outputs. The results uphold the validity of research using CPRD and QResearch by providing independent evidence that both datasets produce very similar estimates of treatment effect, leading to the same clinical and policy decisions. Together with other non-independent replication studies, there is a nascent body of evidence for wider validity

    Analysing indicators of performance, satisfaction, or safety using empirical logit transformation

    Get PDF
    This is the final version of the article. Available from the publisher via the DOI in this record.Performance, satisfaction, and safety indicators are commonly measured on a percentage scale. Such indicators are often subject to ceiling or floor effects and performance may be inherently non-linear. For example, improving from 85% to 95% might be more difficult and need more effort than improving from 55% to 65%. As such, analysis of these indicators is not always straightforward and standard linear analysis could be problematic. We present the most common approach to dealing with this problem: a logit transformation of the score, following which standard linear analysis can be conducted on the transformed score. We also demonstrate how estimates can be back-transformed to percentages for easier communication of findings. In this paper, we discuss the benefits of this method, use algebra to describe the relevant steps in the transformation process, provide guidance on interpretation, and provide a tool for analysis.UK Medical Research Council Health eResearch Centre grant MR/K006665/1 supported the time and facilities of EK

    Excess mortality in England and Wales during the first wave of the COVID-19 pandemic

    Get PDF
    Background: Deaths during the COVID-19 pandemic result directly from infection and exacerbation of other diseases and indirectly from deferment of care for other conditions, and are socially and geographically patterned. We quantified excess mortality in regions of England and Wales during the pandemic, for all causes and for non-COVID-19-associated deaths. Methods: Weekly mortality data for 1 January 2010 to 1 May 2020 for England and Wales were obtained from the Office of National Statistics. Mean-dispersion negative binomial regressions were used to model death counts based on pre-pandemic trends and exponentiated linear predictions were subtracted from: (i) all-cause deaths and (ii) all-cause deaths minus COVID-19 related deaths for the pandemic period (week starting 7 March, to week ending 8 May). Findings: Between 7 March and 8 May 2020, there were 47 243 (95% CI: 46 671 to 47 815) excess deaths in England and Wales, of which 9948 (95% CI: 9376 to 10 520) were not associated with COVID-19. Overall excess mortality rates varied from 49 per 100 000 (95% CI: 49 to 50) in the South West to 102 per 100 000 (95% CI: 102 to 103) in London. Non-COVID-19 associated excess mortality rates ranged from −1 per 100 000 (95% CI: −1 to 0) in Wales (ie, mortality rates were no higher than expected) to 26 per 100 000 (95% CI: 25 to 26) in the West Midlands. Interpretation: The COVID-19 pandemic has had markedly different impacts on the regions of England and Wales, both for deaths directly attributable to COVID-19 infection and for d
    • …
    corecore